Intraclass Correlation Values for Planning Group Randomized Trials in Education

نویسندگان

  • Larry V. Hedges
  • Eric C. Hedberg
چکیده

Experiments that assign intact groups to treatment conditions are increasingly common in social research. In educational research, the groups assigned are often schools. The design of group randomized experiments requires knowledge of the intraclass correlation structure to compute statistical power and sample sizes required to achieve adequate power. This paper provides a compilation of intraclass correlation values of academic achievement and related covariate effects that could be used for planning group randomized experiments in education. It also provides variance component information that is useful in planning experiments involving covariates. The use of these values to compute statistical power of group randomized experiments is illustrated. Intraclass correlations in education 3 Intraclass Correlation Values for Planning Group Randomized Trials in Education Many social interventions operate at a group level by altering the physical or social conditions. In such cases, it may be difficult or impossible to assign individuals to receive different intervention conditions. In such cases, field experiments often assign entire intact groups (such as sites, classrooms, or schools) to the same treatment, with different intact groups being assigned to different treatments. Because these intact groups correspond to what statisticians call clusters in sampling theory, this design is often called a group randomized or cluster randomized design. Cluster randomized trials have been used extensively in public health and other areas of prevention science (see, e.g., Donner and Klar, 2000; and Murray, 1998). Cluster randomized trials have become more important in educational research more recently, following increased interest in experiments to evaluate educational interventions (see, e.g., Mosteller and Boruch, 2002). Methods for the design and analysis of group randomized trials have been discussed extensively in Donner and Klar (2000),and Murray (1998). The sampling of subjects into experiments via statistical clusters introduces special considerations that need to be addressed in the analysis. For example, a sample obtained from m clusters (such as classrooms or schools) of size n randomized into a treatment group is not a simple random sample of nm individuals, even if it is based on a simple random sample of clusters. Consequently the sampling distribution of statistics based on such clustered samples is not the same as those based on simple random samples of the same size. For example, suppose that the (total) variance of a population with clustered structure (such as a population of students within schools) is σT, and that Intraclass correlations in education 4 this total variance is decomposable into a between cluster variance σB and a within cluster variance σW, so that σT = σB + σW. Then the variance of the mean of a simple random sample of size mn from that population would be σT/mn. However, the variance of the mean of a sample of m clusters, each of size n from that population (with the same total sample size mn) would be [1 + (n – 1)ρ]σT/mn, where ρ = σB/(σB + σW) is the intraclass correlation. Thus the variance of the mean computed from a clustered sample is larger by a factor of [1 + (n – 1)ρ], which is often called the design effect (Kish, 1965) or variance inflation factor (Donner, Birkett, and Buck, 1981). Several analysis strategies for cluster randomized trails are possible, but the simplest is to treat the clusters as units of analysis. That is, to compute mean scores on the outcome (and all other variables that may be involved in the analysis) and carry out the statistical analysis as if the site (cluster) means were the data. If all cluster sample sizes are equal, this approach provides exact tests for the treatment effect, but the tests may have lower statistical power than would be obtained by other approaches (see, e.g., Blair and Higgins, 1986). More flexible and informative analyses are also available, including analyses of variance using clusters as a nested factor (see, e.g., Hopkins, 1982) and analyses involving hierarchical linear models (see e.g., Raudenbush and Bryk, 2002). For general discussions of the design and analyses of cluster randomized experiments see Murray (1998), Bloom, Bos, and Lee (1999), Donner and Klar (2000), Klar and Donner (2001), Raudenbush and Bryk (2002), Murray, Varnell, & Blitstein (2004), or Bloom (2005). Wise experimental design involves the planning of sample sizes so that the test for treatment effects has adequate statistical power to detect the smallest treatment effects Intraclass correlations in education 5 that are of scientific or practical interest. There is an extensive literature on the computation of statistical power, (e.g., Cohen, 1977; Kraemer and Thiemann, 1987; Lipsey, 1990). Much of this literature involves the computation of power in studies that use simple random samples. However methods for the computation of statistical power of tests for treatment effects using the cluster mean as the unit of analysis (Blair and Higgins, 1986), analysis of variance using clusters as a nested factor (Raudenbush, 1997), and hierarchical linear model analyses (Sniders and Bosker, 1993) are available. For all of these analyses, the noncentrality parameter required to compute statistical power involves the intraclass correlation ρ. More complex analyses involving covariates require corresponding information (covariate effects or the conditional intraclass correlations after adjustment for covariates). Thus the computation of statistical power in cluster randomized trials requires knowledge of the intraclass correlation ρ. Because plausible values of ρ are essential for power and sample size computations in planning cluster randomized experiments, there have been systematic efforts to obtain information about reasonable values of ρ in realistic situations. One strategy for obtaining information about reasonable values of ρ is to obtain these values from cluster randomized trials that have been conducted. Murray and Blitstein (2003) reported a summary of intraclass correlations obtained from 17 articles reporting cluster randomized trials in psychology and public health and Murray, Varnell, and Blitstein (2004) give references to 14 very recent studies that provide data on intraclass correlations for health related outcomes. Another strategy for obtaining information on reasonable values of ρ is to analyze sample surveys that have used a cluster sampling design involving the clusters of interest. Gulliford, Ukoumunne, and Chinn (1999) and Intraclass correlations in education 6 Verma and Lee (1996) presented values of intraclass correlations based on surveys of health outcomes. There is much less information about intraclass correlations appropriate for studies of academic achievement as an outcome. Such information is badly needed to inform the design of experiments that measure the effects of interventions on academic achievement by randomizing schools (Schochet, 2005). One compendium of intraclass correlation values based on five large urban school districts where randomized trials have been conducted has recently become available (see Bloom, Richburg-Hayes, and Black, 2005). The purpose of this paper is to provide a comprehensive collection of intraclass correlations of academic achievement based on national representative samples. We hope that this compilation will be useful in choosing reference values for planning cluster randomized experiments. Dimensions of Designs Considered Our analyses focused on intraclass correlations for designs involving assignment of schools to treatments. Unfortunately, there is a wide variety of designs that might be used to study education interventions, and each of these designs may have its own intraclass correlation (or conditional intraclass correlation) structure. To attempt to provide a reasonable coverage of the designs most likely to be of interest to researchers planning educational experiments, we considered four dimensions of intervention designs. The first dimension of the design is the grade level. The second dimension of the design is what achievement domain (e.g., reading or mathematics) is the dependent variable. The third dimension of the design is the set of covariates that were used in the analysis, if Intraclass correlations in education 7 any. Finally, the fourth dimension was the socioeconomic (SES) or achievement status of schools sampled in the overall population of schools. These four dimensions of designs can vary independently. We examined all possible combinations of them. Grade level of students and achievement domain. We examined each grade level from Kindergarten through grade 12 and both mathematics and reading achievement at each grade level, with one exception. The exception was reading achievement at grade 11, for which data on a national representative sample was not available to us. Covariates used in the design. We consider four data analysis models involving different covariate sets that we believe are likely to be of considerable interest to educational researchers. The first, the unconditional model, involves testing of treatment effects with no covariates. This is the minimal design, but one that is likely to be of interest in many settings where the researcher has little opportunity to collect prior information about the individuals participating in the experiment. The second model, which we call the conditional model, involves testing of treatment effects conditional on covariates that are ascriptive characteristics of students frequently invoked in models of educational achievement, namely gender, race/ethnicity, and socio-economic status. This design may be appropriate when the researcher can obtain prior, contemporaneous, or retrospective data from administrative records (appropriate because these covariates are unlikely to change). The third model, which we call the residualized gain model, involves testing of treatment effects using pretest scores on the same achievement domain (mathematics or reading) as a covariate. This design is likely to be considerably more powerful than the previous designs, but involves the additional cost of collecting another wave of test data Intraclass correlations in education 8 and the additional organizational burden of making that data collection in a timely manner. The fourth model, which we call the conditional residualized gain model, involves testing of treatment effects using the ascriptive characteristics of students (gender, race/ethnicity, and socio-economic status) and pretest scores on the same achievement domain as a covariates. This design combines both of the sets of covariates in the previous design. SES or achievement status of schools within their settings. Some experimenters undoubtedly wish to use a representative sample of schools within whatever setting they choose to study. Consequently one population of schools we considered was the entire collection of schools within a setting. Researchers sometimes make decisions to carry out their studies in schools that lie within the middle range of outcomes, omitting schools that have had (or are reputed to have had) the very poorest and the very best outcomes, on the rationale that neither the very poorest schools nor the very best schools give a fair test of an intervention. We operationalized this notion by ordering, on average achievement, the entire sample of schools in a setting and selecting the middle 80% of the schools in each setting, omitting the top and bottom 10% of the schools. Some interventions are designed to be compensatory. Experimenters investigating such interventions might choose only schools within a particular context that have low mean achievement or large numbers of low SES students to evaluate the intervention. We operationalized low achievement by ordering, on average achievement, the entire sample of schools in a setting and selecting the lower 50% of the schools, Intraclass correlations in education 9 omitting the upper 50% of the schools. We operationalized low SES by ordering, on proportion of students eligible for free of reduced price lunch, the entire sample of schools in a setting and selecting the upper 50% of the schools, omitting the bottom 50% of the schools. Datasets Used The object of this paper is to estimate intraclass correlations and associated variance components for academic achievement in reading and mathematics for the United States and various subpopulations. Consequently we relied on data from longitudinal surveys with national probability samples, all of which are described in detail elsewhere. We chose longitudinal surveys because we wished to use achievement data collected in earlier years as pretest data for evaluating conditional intraclass correlation relevant for planning studies that would use a pretest as a covariate. In some cases, more than one survey could have provided data on a given grade level. In such cases, we report here results based on the survey with the largest sample size. When it was possible to estimate intraclass correlations for the same grade and achievement domain from more than one survey, we computed estimates from all surveys from which it was possible. Generally, we found that the results agreed within sampling error. The exception was that estimates from the second and third followups of the Prospects samples tended to be least consistent with other estimates. This finding makes sense in light of two principles. The first is that longitudinal studies suffer from attrition and lose their representative character over time, so that followup waves, and particularly second and third follow-ups, are no longer represent exactly the same population. The Intraclass correlations in education 10 second is the more arguable principle that the Prospects study had larger differential (non-random) attrition than other longitudinal studies considered here (which seems to be supported by analyses of attrition). The results reported for Kindergarten, grade 1, and grade 3 were obtained from three waves of the Early Childhood Longitudinal Survey (ECLS). The ECLS is a longitudinal study that obtained a national probability sample of Kindergarten children in 1591 schools in 1998 and followed them through the fifth grade (see Tourangeau, et al., 2005). Achievement test data were collected in both Fall and Spring of Kindergarten and first grade, and in Spring only in third and fifth grades. There was no data collection in second and fourth grade. Thus Fall achievement test data collected in the same year could serve as a pretest in Kindergarten and first grades, while data collected in the Spring of the first grade served as pretest data for the third grade. The results reported for grade 2 were obtained from the first followup to the first grade (base year) sample and those reported for grades 4 to 6 were obtained from the three follow-ups of the third grade (base year) sample in the Prospects study, and the results in reading in grades 7 and 9 were obtained from the base year and the second followup of the seventh grade sample in the Prospects study. Prospects was actually a set of three longitudinal studies, starting with (base year) national probability samples of children in 235, 240, and 137 schools, in grades 1, 3, and 7, respectively, conducted in 1991 (for a complete description of the study design, see Puma, et al., 1997). Achievement test data was collected for three to four years thereafter for each sample. Thus the three prospects studies collected data in grades 1 (both Fall and Spring), 2, and 3; grades 3, 4, 5, and 6; and 7, 8, and 9. There was pretest data in the base year for grade Intraclass correlations in education 11 1, but no pretest data for the base years in grades 3 and 7. For all years except the base year, the previous year’s achievement test data was used as a pretest and in grade 1 the test data collected in fall served as a pretest. The results reported on reading in grades 8, 10, and 12 and mathematics in grades 10 and 12 were obtained from the National Educational Longitudinal Study of the Eighth Grade Class of 1988 (NELS: 88). NELS: 88 is a longitudinal study that began in 1988 with a national probability sample of eighth graders in 1050 schools and collected reading and mathematics achievement test data when the students were in grades 8, 10, and 12. Thus no pretest data was available for grade 8, but for the grade 10 the grade 8 data was used as a pretest and for grade 12 the grade 10 data was used as a pretest. Finally, the results on mathematics in grades 7, 8, 9, and 11 were obtained from the base year and follow-ups of the Longitudinal Study of American Youth (LSAY) (see Miller, et al., 1992). The LSAY is a longitudinal study that began in 1987 with two national probability samples, one of seventh graders in and one of tenth graders in 104 schools. Data were collected on mathematics and science achievement each year for four years leading to samples from grades 7 to 12. There was no pretest data in grade 7, but the previous year’s data served as the pretest for each subsequent year. Analysis Procedures The data analysis was carried out using STATA version 9.1’s “XTMIXED” routine for mixed linear model analysis. For each sample and achievement domain, analyses were carried out based on four different models, which we call the unconditional model, the residualized gain model, the conditional model, and the conditional Intraclass correlations in education 12 residualized gain model. We describe these explicitly below in hierarchical linear model notation. The unconditional model. The unconditional model involves no covariates at either the individual or school (cluster) levels. The level-one model for the k observation in the j school can be written as 0 jk j jk Y β ε = + , and the level two model for the intercept is 0 00 j j ζ β π = + , where εjk is an individual-level residual and ζj is a random effect of the j cluster (a leveltwo residual). The variance components associated with this analysis are σW (the variance of the εjk) and σB (the variance of the ζj). The residualized gain model. If pretest scores on achievement are available, they can be a powerful covariate and considerably increase power in experimental designs. The residualized gain model involves using the cluster-centered pretest score at the individual level and the school mean pretest score at the school level. Thus the level-one model for the k observation in the j school can be written as 0 1 ( ) jk j j jk j jk Y X X β β ε • = + − + , and the level two model for the intercept is 0 00 01 j j π X ζ β π • = + + j , where Xjk is the achievement pretest score for the j observation in the k school, j X • is the pretest mean for the j school, εjk is an individual-level residual and ζj is a random effect of the j school (a level-two residual) and the covariate slope β1j was treated as Intraclass correlations in education 13 equal in all clusters (schools). The variance components associated with this analysis are σAW (the variance of the εjk) and σAB (the variance of the ζj). The conditional model. Sometimes pretest scores are not available but other background information about individuals is available to serve as covariates. The conditional model includes four covariates at each of the individualand group(cluster) level. At the individual-level, the covariates are dummy variables for male gender and for Black or Hispanic status, and an index of mothers and father’s level of education as a proxy for socioeconomic status. As recommended by Raudenbush and Bryk (2002), each of these individual-level covariates was group centered. The school-level covariates were the means of the individual level variables for each school (cluster). Therefore the levelone model for the k observation in the j school can be written as 0 1 2 3 4 ( ) ( ) ( ) ( ) jk j j jk j j jk j j jk j j jk j jk Y β G G β B B β H H β E E β ε • • • = + − + − + − + − + • where Gjk , Bjk, and Hjk, are dummy variables for male gender, Black, and Hispanic status, respectively, E is an index of mothers and father’s level of education (which is a proxy for family SES), and j G • , j B • , j H • , and j E • are the means of G, B, H, and E in the j th school (cluster). The level-two model for the intercept is 0 00 10 20 30 40 j j j j j j β π π G π B π H π E ζ • • • • = + + + + + , and the covariate slopes β1j, β2j, β3j, and β4j were treated as equal in all clusters (schools). The variance components associated with this analysis are σAW (the variance of the εjk) and σAB (the variance of the ζj). The residualized conditional model. The residualized conditional model combines the use of an achievement pretest and the individual characteristics of gender, minority Intraclass correlations in education 14 group status, and parent’s education as individualand school-level covariates. Therefore the level-one model for the k observation in the j school can be written as 0 1 2 3 4 5 ( ) ( ) ( ) ( ) ( ) jk j j jk j j jk j j jk j j jk j j jk j jk Y β X X G G β B B β H H β E E β β ε • • • • = + − + − + − + − + − + • where all of the symbols are defined as in the models above. The level-two model for the intercept is 0 00 10 20 30 50 j j j 40 j j j β π π X π G π B π H π E ζ • • • • = + + + + + + , and the covariate slopes β1j, β2j, β3j, β4j, and β5j were treated as equal in all clusters (schools). The variance components associated with this analysis are σAW (the variance of the εjk) and σAB (the variance of the ζj). The Intraclass Correlation Data The (unconditional) intraclass correlation associated with the unconditional model described above is ρ = σB/[ σB + σW] = σB/σT, (1) where σT = σB + σW is the (unconditional) total variance. Note that the residuals εjk and ζj correspond to the withinand between-cluster cluster random effects in an experiment that assigned schools to treatments. Consequently, the variance components associated with these random effects and the intraclass correlation corresponds to those in a cluster randomized experiment that assigned schools to treatments and analyzed the data with no covariates. In the three models involving covariate adjustment, the (covariate adjusted) intraclass correlation is ρA = σAB/[ σAB + σAW] = σAB/σAT, (2) Intraclass correlations in education 15 where σAT = σAB + σAW is the (covariate adjusted) total variance. Note that the residuals εjk and ζj correspond to the withinand between-cluster cluster random effects in an experiment that assigned schools to treatments and used the same covariates as were used in the models with covariates. Consequently, the variance components associated with these random effects and the conditional intraclass correlation ρA correspond to those in a cluster randomized experiment that assigned schools to treatments and analyzed the data with these (individual and school mean) characteristics as covariates. For each combination of design dimensions (that is for each grade level, achievement domain, covariate set, setting, and choice of SES/achievement status within setting) we estimated the intraclass correlation (or conditional intraclass correlation) via restricted maximum likelihood using STATA and computed the standard error of that intraclass correlation estimate using the result given in Donner and Koval (1982). This resulted in 13 (grade levels) x 2 (achievement domains) x 4 (covariate sets) x 4 (SES/achievement statuses within settings) = 416 intraclass correlation estimates (each with a corresponding standard error). For designs that employ covariates, we also provide values of ηB = σAB/σB, (3) the percent reduction in between-school variance and ηW = σAW/σW, (4) the percent reduction in within-school variance, respectively, after covariate adjustment. For designs involving covariates, these two auxiliary quantities (ηB and ηW) are useful in computing statistical power. Their use is illustrated in a subsequent section of this paper. Intraclass correlations in education 16 Two alternative parameters that contain the same information as ηB and ηW are RB = 1 – ηB and RW = 1 – ηW, the proportion of betweenand within-group variance explained by the covariate. We chose to tabulate the η values instead of the R values because the relation of the η values to the noncentrality parameters used in power analysis is simpler. Note that each of the four analyses involved slightly different variables, and there were missing values on some of these variables in our survey data. We decided to compute each analysis on the largest set of cases that had all of the necessary variables for the analysis in question. This means that each of the four analyses of a given dataset is computed on a slightly different set of cases. Because the quantities ηW and ηB involve a comparison of two different analyses (one with and one without a particular set of covariates), we believed it was important to make this comparison using estimates derived from exactly the same set of cases. Consequently, for each of the analyses that involved covariates, we re-computed the estimates of the unadjusted variance components, σW and σB, using only the cases that were used to compute the adjusted variance components σAW and σAB and used these particular estimates to compute the ηW and ηB values given here. Although we provide estimates of the standard errors of the intraclass correlations, they should be used with some caution for two reasons. First, the distribution of estimates of the intraclass correlations is only approximately normal. Second, not all of these values are independent of one another and it is not immediately clear how to carry out a formal statistical analysis of differences between estimates of intraclass correlations computed from the sample of individuals. Never the less, we feel that these standard Intraclass correlations in education 17 errors are useful as descriptions of the uncertainty of the individual estimates of intraclass correlations. Results We found that the intraclass correlations obtained in the nationally representative sample and the schools in middle 80% of the achievement distribution had intraclass correleations that were almost identical. Consequently, we present results here only the intraclass correlation data from the entire national sample of schools, those in the upper half of the free and reduced price lunch distribution (low SES schools), and those in the lower half of the school mean achievement distribution (low achievement schools). Mathematics achievement in the full population. Table 1 is a presentation of results from the entire national sample in mathematics. The table is divided into four panels of three columns each, one panel for each of the four analyses described above. The data for each grade level is given in a different row. In the row for each grade, the columns of each panel provide the estimates of the intraclass correlation (ρ), the standard error of the estimate of ρ (in parentheses after the estimate of ρ), and (for all but the unconditional model given in the first panel on the left hand side) estimates of ηB and ηW. For example, consider the data for the residualized unconditional model for grade 1, given in the third panel of the table. On the row associated with grade 1, the values in the columns of the third panel (columns 8 to 11 of the table) are 125, 13.5, 177, and 376, respectively, which correspond to estimates of 0.125, 0.0135, 0.177, and 0.376 for ρA, the standard error of the estimate of ρA, ηB, and ηW. Intraclass correlations in education 18 Although there is a tendency of the intraclass correlations to be larger at lower grades, in general there are not large changes across adjacent grade levels. Few of these differences exceed two standard errors of the difference. A notable exception is the unadjusted intraclass correlation at grade 11, where the difference between grade 11 and either of the adjacent grades is about three standard errors of the difference. None of the differences between adjusted intraclass correlations in adjacent grades is a large as three standard errors of the difference, but the values for grade 2 are somewhat higher (by over two standard errors of the difference) and those for grade 3 somewhat lower than those of adjacent grades. The pattern of reduction of between and within-cluster (school) variances are generally quite different in these models. Specifically, the conditional analyses typically reduced the between cluster variance to one-half to one-quarter of its value in the unconditional model (e.g., produced ηB from 0.5 to 0.25), but typically reduced withincluster variance by 10% or less (e.g., produced ηW values greater than 0.9). The residualized analyses using pretest score as a covariate typically resulted in larger reductions in between-cluster variance (e.g., produced ηB values from 0.3 to 0.1), but typically also reduced within-cluster variance by a much larger amount than the conditional model (e.g., produced ηW values from 0.25 to 0.5). Different patterns of variance reduction have quite different implications for statistical power, even if they correspond to the same adjusted intraclass correlation (see the section on power computation in models with covariates). Reading achievement in the full population. Table 2 is a presentation of results from the entire national sample in reading, organized in the same way as Table 1 which Intraclass correlations in education 19 reported results for mathematics. The intraclass correlation and adjusted intraclass correlation values in reading are generally quite similar to those in mathematics. As in mathematics, there is a tendency of the intraclass correlations in reading to become smaller at higher grades, but the changes across adjacent grade levels are often larger. The results for grade 9 are particularly inconsistent with (having larger values of the intraclass correlations than) the results from either grade 8 or grade 10. The results from grade 2 are also somewhat different (having smaller values of the intraclass correlations than) the results from either grade 1 or grade 3. Several of these differences exceed three standard errors of the difference. Few of the other differences exceed two standard errors of the difference. There is less consistency in reading than in mathematics among the adjusted intraclass correlations for the three models involving covariates. However the general pattern of reduction in betweenversus within-cluster variance was similar in reading and in mathematics. That is, there was somewhat greater reduction in between-cluster variance and much greater reduction in within-cluster variance in the residualized model than in the conditional model. Mathematics achievement in low SES schools. Table 3 is a presentation of results in mathematics computed for the schools in the bottom half of the school SES distribution (operationalized by proportion of students eligible for free or reduced price lunch). There appears to be a slight tendency for the intraclass correlation values in this sample to be a bit smaller than those reported in Table 1 for the entire national population, a tendency that does not hold for the conditional (adjusted) intraclass correlations. The pattern of variation in the mathematics intraclass correlations and conditional intraclass Intraclass correlations in education 20 correlations across regions, urbanicity of school setting, and regions crossed with urbanicity in the low SES school sample was similar to that in all schools. Reading achievement in low SES schools. Table 4 is a presentation of results in mathematics computed for the schools in the bottom half of the school SES distribution (operationalized by proportion of students eligible for free or reduced price lunch). As in the case of mathematics, there appears to be a slight tendency for the intraclass correlation values in this sample to be a bit smaller than those reported in Table 2 for the entire national population, a tendency that does not hold for the conditional (adjusted) intraclass correlations. The pattern of variation in the reading intraclass correlations and conditional intraclass correlations across regions, urbanicity of school setting, and regions crossed with urbanicity in the low SES school sample was similar to that in all schools. Mathematics achievement in low achievement schools. Table 5 is a presentation of results in mathematics computed for the schools in the bottom half of the distribution of school mean mathematics achievement. The intraclass correlation values in this sample are considerably smaller than those reported in Table 1 for the entire national population, a tendency that also holds for the conditional (adjusted) intraclass correlations. There is some variation of intraclass correlations across grade levels, but only the difference between grades 4 and 5 is larger than two standard errors of the difference. In general the intraclass correlations at Kindergarten through grade 4 range from about 0.09 to 0.13, in grades 5 through 7 they range from about 0.05 to 0.08, and in grades 8 through 12 they range from 0.075 to 0.085. The use of covariates resulted in a much smaller reduction in both betweenand within-school variances in this sample than in the unrestricted sample. Specifically, the Intraclass correlations in education 21 conditional analyses typically reduced the between-school variance to no less than onehalf of its value in the unconditional model (e.g., produced ηB from 0.5 to 0.8), but typically reduced within-cluster variance by 5% or less (e.g., produced ηW values greater than 0.95). The residualized analyses using pretest score as a covariate typically (but not always) resulted in modestly larger reductions in between-cluster variance (e.g., produced ηB values from 0.3 to 0.8), but typically reduced within-cluster variance by a larger amount than the conditional model (e.g., produced ηW values from 0.5 to 0.8). Thus we find that the intraclass correlation is smaller in this sample, but the explanatory power of pretest and other covariates is also smaller. These two tendencies have opposite effects on statistical power. The smaller intraclass correlation generally leads to larger statistical power but the smaller explanatory power of covariates generally leads to larger statistical power, one partially offsetting the effects of the other. Reading achievement in low achievement schools. Table 6 is a presentation of results in mathematics computed for the schools in the bottom half of the distribution of school mean reading achievement. As in the case of mathematics, the intraclass correlation values in this sample are considerably smaller than those reported in Table 2 for the entire national population, a tendency that also holds for the conditional (adjusted) intraclass correlations. There is some variation of intraclass correlations across grade levels. The intraclass correlation in grade 9 is larger (by over three standard errors of the difference) than that in either of the adjacent grades. Similarly the intraclass correlation in grade 1 is more than two standard errors greater than that in Kindergarten, but less than two standard errors of the difference from that in grade 2. None of the other differences Intraclass correlations in education 22 between grades is this large in comparison to their uncertainty. In general the intraclass correlations at grades Kindergarten through 4 range from about 0.10 to 0.14, in grades 5 through 8 they range from about 0.06 to 0.07, and in grades 10 through 12 they are about 0.05. As in the case of mathematics, the use of covariates resulted in a much smaller reduction in both betweenand within-school variances in this sample than in the national sample. Specifically, the conditional analyses typically reduced the between-school variance to no less than one-half of its value in the unconditional model (e.g., produced ηB from 0.5 to 0.8), but typically reduced within-cluster variance by 5% or less (e.g., produced ηW values greater than 0.95). The residualized analyses using pretest score as a covariate typically (but not always) resulted in modestly larger reductions in betweencluster variance (e.g., produced ηB values from 0.3 to 0.8), but typically reduced withincluster variance by a larger amount than the conditional model (e.g., produced ηW values from 0.5 to 0.8). Thus we find, as in the case of mathematics, that the intraclass correlation is smaller in this sample, but the explanatory power of pretest and other covariates is also smaller, one of these differences partially offsetting the effects of the other on statistical power. Minimum Detectable Effect Sizes One way to summarize the implications of these results for statistical power is to use them to compute the smallest effect size for which a target design would have adequate statistical power. This effect size is often called the minimum detectable effect size (MDES), see Bloom (1995) and Bloom (2005). In computing the MDES values Intraclass correlations in education 23 reported in this paper, we used the value 0.8 with a two-sided test at significance level 0.05 as the definition of adequate power. We considered designs with no covariates and with pretest as a covariate at both the individual and group level. We considered both reading and mathematics achievement as potential outcomes. Finally we considered a balanced design with a sample of size of n = 60 per school with m = 10, 15, 20, 25, or 30 schools randomized to each treatment group. Table 7 gives the minimum detectable effect sizes based on parameters given in Tables 1 and 2 that were estimated from the full national sample. Perhaps the most obvious finding is that the corresponding MDES values for mathematics and reading are quite similar. With no covariates, the MDES values typically exceed 0.60 for m = 10 and typically exceed 0.35 even for m = 30. However the use of pretest as a covariate reduces the MDES values to less than 0.40 for m = 10 and 0.20 or less for m = 30. Although there is no universally adequate standard for evaluating the importance of effect sizes, applying Cohen’s (1977) widely used labels of 0.20 as small and 0.50 as medium would imply that an experiment randomizing m = 10 schools to each treatment should be adequate to detect effects of “medium” size and that an experiment randomizing m = 30 schools to each treatment should be adequate to detect effects of “small” size. Table 8 gives the minimum detectable effect sizes based on parameters given in Tables 3 and 4 that were estimated from the national sample of low SES schools. These results are remarkably similar to those in Table 7. Table 9 gives the minimum detectable effect sizes based on parameters given in Tables 5 and 6 that were estimated from the national sample of schools in the lower half of the achievement distribution. Because the unconditional intraclass correlations are Intraclass correlations in education 24 lower, the MDES values for designs with no covariates are smaller. However because the covariates are less effective in reducing between and with-school variance in this sample, the MDES values with pretest as a covariate are not always smaller than in the national sample of all schools. With no covariates, the MDES values typically less than 0.50 for m = 10 and less than 0.30 for m = 30. However the use of pretest as a covariate typically reduces the MDES values to about 0.30 for m = 10 and 0.20 or less for m = 30. Using the Results of this Paper to Compute Statistical Power of Cluster Randomized Experiments In this section, we illustrate the use of the results in this paper to compute the statistical power of cluster randomized experiments. Consider the two treatment group design with q (0 ≤ q < M – 2) group-level (cluster-level) covariates and p (0 ≤ p < N – q – 2) individual-level covariates in the analysis. Note that we specifically include the possibility that there are 0 (no) covariates at a given level. For example a design with p = 1 and q = 1 might arise, for example, if there was a pretest that was used as an individuallevel covariate and cluster means on the covariate were used as a group level covariate. We assume also that the individual-level covariate has been centered about cluster means. The structural model for Yijk, the k observation in the j cluster in the i treatment might be described in ANCOVA notation as ( ) ijk Ai I ijk G ij A i j Aijk Y μ α γ = + + + + + + ' ' θ x θ z ε , where μ is the grand mean, αAi is the covariate adjusted effect of the i treatment, θI = (θI1, ..., θIp)’ is a vector of p individual-level covariate effects, θG = (θG1, ..., θGq)’ is a Intraclass correlations in education 25 vector of q group-level covariate effects, xij is a vector of p group (cluster) centered individual-level covariate values for the j cluster in the i treatment, zij is a vector of q group-level (cluster-level) covariate values for the j cluster in the i treatment, γ(i)j is the random effect of cluster j within treatment i, and εAijk is the covariate adjusted within cell residual. Here we assume that both of the random effects (clusters and the residual) are normally distributed. The analysis might be carried out either as an analysis of covariance with clusters as a nested factor or by viewing the model as a hierarchical linear model and using software for multilevel models such as HLM. In multilevel model notation, it would be conventional to specify a level-one (individual-level) model as 0 ijk j j ijk Aijk Y β ε = + + ' β x , and a level-two (cluster-level) model for the intercept as 0 00 01 02 j A i ij TREATMENT ζ β π π = + + + ' π z Aj , where TREATMENTi is a dummy variable for the treatment group, while the covariate slopes in βj would be treated as fixed effects (βj = θI), and ζAj is the random effect of the j cluster (a level-two residual). With the appropriate constraints on the ANCOVA model (i.e., setting αAi = 0 for the control group and constraining the mean of the γA(i)j’s to be 0), these two models are identical and there is a one to one correspondence between the parameters and the random effects in the two models. That is, μ = π00, αAi = πA01, θG = π02, θI = βj (for all j), γA(i)j = ζAj (with a suitable redefinition of the index j), and εAijk identical in both models. The variance components associated with this analysis are σAW (the variance of the εAijk) and σAB (the variance of the ζj), where the A in the subscript denotes that these variance components are adjusted for the covariate. Intraclass correlations in education 26 The intraclass correlations. Note that if in the experiment, schools were sampled at random, students were sampled at random within schools, and q = p = 0, then ρ = σB/[σB + σW] is exactly the intraclass correlation that would obtain in a survey that sampled first schools and then students at random. Similarly, if there are covariates in the experiment, schools were sampled at random, students were sampled at random within schools, and q ≠ 0 or p ≠ 0, then ρA = σAB/[σAB + σAW] is exactly the adjusted intraclass correlation that would obtain in the analysis of the survey (with appropriate covariates) that sampled first schools and then students at random. Hypothesis Testing The object of the statistical analysis is to test the statistical significance of the intervention effect, that is, to test the hypothesis H0: αA1 – αA2 = 0 or equivalently H0: πA01 = 0. The ANCOVA t-test statistic is 1 2 ( A A A A m Y Y t S •• •• − = ) , (5) where is as defined above, m 1 A Y •• and 2 A Y •• are the adjusted means, SA is the pooled withintreatment-groups adjusted standard deviation of cluster means, and the subscript A is used to connote that the means and standard deviation are adjusted for the covariates. The F-test statistic from a one-way analysis of covariance using cluster means is of course 2 AB A A AC MS F MS = = t . (6) Intraclass correlations in education 27 In this case MSAB = 2 1 2 ( A A nm Y Y •• •• − ) and MSAC = nSA , where SA is the pooled withintreatment-groups standard deviation of the covariate adjusted cluster means (the standard deviation of the level-two residuals). If the null hypothesis is true, the test statistic tA has Student’s t-distribution with M – q – 2 degrees of freedom. Equivalently, the test statistic FA has the central F-distribution with 1 degree of freedom in the numerator and M – q – 2 degrees of freedom in the denominator when the null hypothesis is true. When the null hypothesis is false, the test statistic tA has for this analysis has a noncentral t-distribution with M – q – 2 degrees of freedom and noncentrality parameter [ ] 1 1 1 1 1 A1 A2 A A AT A A mn mnδ λ σ n ρ n ρ − = = + − + − ( ) ( ) ( ) α α , (7) where δA = (αA1 – αA2)/σAT. Alternatively (and equivalently), the F-statistic has the noncentral F-distribution with 1 degree of freedom in the numerator and M – q – 2 degrees of freedom in the denominator and noncentrality parameter

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intraclass Correlations for Planning Group Randomized Experiments in Rural Education

This material is based upon work supported by the National Science Foundation under Grant No. 0129365. Correspondence concerning this article should be addressed to Larry V. Hedges, Northwestern University, WCAS Statistics Education & Social Policy Institute for Policy Research, Annenberg Hall EV2610, Evanston, IL 60208. ([email protected]) Citation: Hedges, L. & Hedberg, E.C. (2007, Au...

متن کامل

Intraclass correlation coefficients for cluster randomized trials in care pathways and usual care: hospital treatment for heart failure

BACKGROUND Cluster randomized trials are increasingly being used in healthcare evaluation to show the effectiveness of a specific intervention. Care pathways (CPs) are becoming a popular tool to improve the quality of health-care services provided to heart failure patients. In order to perform a well-designed cluster randomized trial to demonstrate the effectiveness of Usual care (UC) and CP in...

متن کامل

A priori postulated and real power in cluster randomized trials: mind the gap

BACKGROUND Cluster randomization design is increasingly used for the evaluation of health-care, screening or educational interventions. The intraclass correlation coefficient (ICC) defines the clustering effect and be specified during planning. The aim of this work is to study the influence of the ICC on power in cluster randomized trials. METHODS Power contour graphs were drawn to illustrate...

متن کامل

Design and analysis of group-randomized trials: a review of recent methodological developments.

We review recent developments in the design and analysis of group-randomized trials (GRTs). Regarding design, we summarize developments in estimates of intraclass correlation, power analysis, matched designs, designs involving one group per condition, and designs in which individuals are randomized to receive treatments in groups. Regarding analysis, we summarize developments in marginal and co...

متن کامل

Reliability, effect size, and responsiveness of health status measures in the design of randomized and cluster-randomized trials.

BACKGROUND New health status survey instruments are often described by their psychometric (measurement) properties, such as Validity, Reliability, Effect Size, and Responsiveness. For cluster-randomized trials, another important statistic is the Intraclass Correlation (ICC) for the instrument within clusters. Studies using better instruments can be performed with smaller sample sizes, but bette...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006